-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Consolidate filters and projections onto TableScan
#20061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| // Wrap with ProjectionExec if projection is present and differs from scan output | ||
| // (either non-identity, or fewer columns due to filter-only columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea for #19387 is that we might be able to push down trivial expressions here, thus avoiding the need for any physical optimizer changes/rules.
| LogicalPlan::Filter(filter) => { | ||
| // Split AND predicates into individual expressions | ||
| filters.extend(split_conjunction(&filter.predicate).into_iter().cloned()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can drop this since filters are effectively pushed into TableScan now?
c4b139d to
d0ff08f
Compare
538fafe to
37d40e8
Compare
37d40e8 to
0166985
Compare
|
@kosiew would you be open to reviewing this? |
kosiew
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found an issue and stopping as there are other CI issues.
| LogicalPlan::TableScan(scan) => { | ||
| // Also extract filters from TableScan (where they may be pushed down) | ||
| filters.extend(scan.filters.iter().cloned()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a problem in scenarios like UPDATE target FROM source ... where the input plan contains TableScan nodes for both target and source. This function will extract filters from both tables and attempt to apply them to the target table (after stripping qualifiers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting. Is the problem the stripping of qualifiers? It seems to me that the current implementation is generally fragile: we should be taking into account the join aspect and only collect from subtrees of the join. In particular:
> explain format indent UPDATE "trg" SET col = 1 FROM src WHERE trg.id = src.id AND src.type = 'active' AND trg.id > 100;
+---------------+-----------------------------------------------------------------------+
| plan_type | plan |
+---------------+-----------------------------------------------------------------------+
| logical_plan | Dml: op=[Update] table=[trg] |
| | Projection: trg.id AS id, Utf8View("1") AS col |
| | Inner Join: trg.id = src.id |
| | Filter: trg.id > Int32(100) |
| | TableScan: trg projection=[id] |
| | Projection: src.id |
| | Filter: src.type = Utf8View("active") AND src.id > Int32(100) |
| | TableScan: src projection=[id, type] |
It seems to me that we should do something like:
- Find the table scan corresponding to the table being updated.
- Walk up the tree until we hit a join / subquery (?) / other blocker.
- Collect all filters in that subtree (ideally once this PR is across the line they've all been pushed into the
TableScanso that becomes trivial)
I don't know how the DML stuff is supposed to handle more complex cases involving EXISTS, etc.
Thank you, I will look into this. I'm struggling with the CI issue: https://github.com/apache/datafusion/actions/runs/21495116091/job/61927800829?pr=20061#step:4:7993 It seems like the only diff is the inclusion of a backtrace. I can reproduce locally if I run with |
Related issues
Closes #19894. I think this will also help with #19387 as well.